Strikeout Betting Analysis (WIP)

Have you ever wondered if it was possible to consistently profit from sports betting? For two months from June to August, I put my own money on the line ($200 to be exact) using simple, easy-to-follow betting methods to create this comprehensive dataset to analyze. To clarify, this method uses third-party projections, and this project is not about building our own projections. This projects seeks to find out if a bettor can use third-party projections to identify gaps between the true odds of an event and the posted odds in a sportsbook to profit using a practice called positive EV betting.

Exploratory Data Analysis

To begin, Let's look at some descriptive statistics to get a better sense of the data.

First, let's answer the big juicy question: how much money did this experiment make?

The first calculation is the percentage return from betting. (The percentage return of the total dollar amount wagered)

The second calculation is the return on initial investment, which was $200

What most people might be interested in is the returns from initial investment, and as you can see, it is quite high for a two-month experiment. However, please do not quit your job just yet and throw your life savings into sports betting using this method. THIS EXPERIMENT IS NOT INVESTMENT ADVICE. The up-and-down swings of cumulative returns from day-to-day were still quite wild, suggesting that two months was not nearly enough time to conclude the profitability of this method.

After seeing the money-making potential, I bet you're willing to sit through the explanation now. Online sportsbooks allow you to bet on a large varieties of events, from wins, scoring, individual player performance, and other sport specific events. For baseball, one such available event is the number of strikeouts that a pitcher gets in a game. A quick refresher: it's three strikes and you're out! Using the projection tool SaberSim, I retrieved the daily strikeout projections for each pitcher that day and compared it to their over-under line for strikeouts. If the over-under line, or O/U, is 4.5 for example, you can either bet the pitcher throws at least 5 strikeouts or at most 3 strikeouts. Using the projection, I calculated the probability of either outcome happening by assuming a poisson distribution and then calculated the betting edge based on the posted odds on the sportsbook. Finding the "edge" means to identify bets that have a higher probability of happening than what the odds suggest. Over time, this gives you a positive expected value in returns, assuming the betting edge is accurate. SaberSim has its own complex model that takes in play-by-play data from the past 10 years then simulates games play-by-play to get projections. More on how SaberSim works for baseball can be found here

Here is the more of the math behind this experiment, if you're interested. In order to calculate the probability of the pitcher throwing either over or under the given betting line, we assume it approximates a poisson distribution. I have not yet rigorously tested the validity of the assumption (I will soon), but so far, it has worked fairly well. The poisson distribution is a discrete distribution that measures the probability of a given number of events happening in a specified period of time if these events occur with a known constant mean rate and independently of the time since the last event. The first part is satisfied pretty easily: a given number of events (strikeouts) happening in a specified period (a game). It is the second part where we have to potentially shoehorn in further assumptions. We have to assume that the projection applies as the mean rate for a given pitcher on that given day (because the projections are updated for each of their appearances) and that each strikeout event is independent of one another (which you can argue is note entirely true due to psychological factors). At this point, there are two major unknowns: how well pitcher strikeouts are approximated by a poisson distribution and how well the projections approximate the mean in a poisson distribution. These questions need further testing and will require data that is not present in the dataset here. My hope is that this analysis will also lead us to ask more insightful statistical questions further down the road.

Let's first look at the distribution of all the strikeout projections.

It's slightly skewed to the right and with a mean of 5.09

The strikeout projections from SaberSim are essentially the averages of all their simulated games, so we can think of that as an average of a "sample" of simulated games. This distribution can then be thought of as the sampling distribution of pitcher strikeouts. However, the pitchers recorded in this dataset are starting pitchers, which means that they rotate on a schedule of around 5 days. This means that the data are not independent because the same pitchers show up several times and will have similar projections based on their previous performance. To get an independent sample of individual pitcher distributions, we can randomly pick one instance of a projection for each pitcher.

The results here show a slightly lower mean at 4.89 and the majority of projections between 3 and 6 strikeouts. In terms of likelihood to win bets overall, it looks favorable to bet towards this range, especially when the betting line is below 3 or above 6.

To get a better idea of which O/U lines are more favorable to bet on, we can look at each of the O/U lines by winnings and win percentage. We will start with 'Model 1' which is where we only use the SaberSim strikeout projection to bet. There is another model,'Model 2', which uses another projection in addition to SaberSim.

Note: We will only be analyzing 'Model 1' because 'Model 2' was even more experimental and a bit unecessarily complicated for our purposes.

This is a lot to take in, but let's break it down. The first column 'Net_sum' essentially shows the 'profit' made from each OU line. 'Bet_Amt_sum is the total amount wagered for the corresponding OU line. As you can see, the vast majority of bets were made on the '3.5','4.5', and '5.5' lines, which is what we expect based on the distribution of strikeout projections. Most projections fall between 3 and 6. The lines posted on the sportsbook are also a combination of the sportsbook's own projections and how people are betting. It's in the sportsbook's best interest to split the bettors as close to 50/50 as possible, so they like to make accurate projections too. The 'Wins' and 'Losses' columns should be self-explanatory. 'Unders' and 'Overs' means the number of bets that were either made under or over the line. 'Win_Pct' Should also be pretty self-explanatory.

Looking at the number of 'Unders' and 'Overs', we can see that the bets tended to be made towards the mean of 4.89 ('3.5' had more 'Over' bets and '5.5' had more 'Under' bets). The bets were only made if the model calculated a betting edge of over 5% (meaning you win the bet +5% more often than breaking even, according to the model), so the model also tended to suggest betting towards the mean.

The success of betting towards the mean can be reflected in the win percentages for '3.5', '4.5', '5.5', and even '2.5'. However, the same cannot be said for '6.5', '7.5', and '4.5' ('4.5' won barely more than 50% of the time which was not enough profit overall). This is likely due to the model breaking down at some point(s) on the distribution. Remember, we very roughly assumed that the strikeouts and strikeout projections fit a poisson distribution, but if the model breaks down at some point, we know that the poisson distribution doesn't fit completely. We could sit here an think about some intuitive reasons on why the poisson distribution doesn't fit for these OU lines (which I have thought about), but I would rather test this empirically later on. Nevertheless, the model in its current state seems to work pretty well for the OU lines '2.5', '3.5', '5.5', and possibly '8.5'. It could be a sound strategy to only bet on these lines using the current model. Another adjustment could be to increase the betting edge threshhold from 5% to 10% for example, so we only place more advantageous bets, potentially covering up for more of the error in the model. We can't increase it too much though, otherwise there won't be enough bets to place.

Statistical Analysis

From the charts and visualizations above, we have learned quite a few things about the data that we have. We talked about how the model worked, saw how it behaved, and learned about what kind of bets were more favorable than others. Now we can ask the most important question of all, the most important to anyone with their hard-earned money on the line. Was it luck?

Bootstrap

In order to answer this question, we can think of our data as the sample of a population. Then we can bootstrap it to calculate a confidence interval and do a hypothesis test with the null hypothesis that the sample sum is 0. Bootstrapping gives us more flexibility in testing because it doesn't make any assumptions about the distribution of our data.(Our sample size is probably big enough to assume the normality of the sampling distribution; but I will admit that my wager amounts were not entirely consistent, so we'll try to make less assumptions). Now let's bootstrap the results of Model 1 to get an idea of whether we got lucky or if the method is reliably profitable.

So this is not looking good at first glance. I don't think we need to calculate a confidence interval to see that the model does not reliably make money (our 95% CI interval needs to exclude 0 for us to reject the null hypothesis). However, we have to remember that choosing what bets to make is a complex interaction between the SaberSim projection, the betting line, the betting odds, the betting edge calculated, and the fit of the poisson distribution. The projections, the lines, and the way people bet also change across a season. The "population" of outcomes using this model can look very different as the seaason progresses. It's possible that there are certain times in the season that are more profitable, where the model gives better advantages in betting. Therefore, we may not be able to generalize the results to all parts of a season. There are two separate one-month periods where I mainly used 'Model 1'. From 6/14 - 7/15 and 8/10 - 9/8. During the period in between, I experimented with another model.

We can try to bootstrap each period separately. I would expect the results to be quite different as there was a large drop-off in cumulative win percentage in the second period.

Bootstrap 6/14 - 7/15 and 8/10 - 9/8

Now there is a clear difference in the two time periods. The first period earns significantly more, and the second period seems to be losing very much. However, the CI interval of the first period still includes 0.

Bootstrap for specific OU lines

What if we pretended that, knowing what we know now, we went back in time and changed the model to only wager on the OU lines that we observed to be most profitable. Could that give us the best chance at guaranteeing ourselves a profit?

The CI still includes 0, but this is also for both periods. Let's try filtering for the most profitable OU lines and the first period.

The 95% CI here is above 0. Now it seems we have a "method" for making a profit with statistical significance at the 95% confidence level.

Summary

From our statistical analysis we have determined a specific method and timeframe for which our model is likely to profit. (Making a note to myself if I ever need extra cash). We have to remember, however, that there are many complex factors involved, and this is a good stepping point to design a more controlled experiment and analysis now that we know what to look for. For future reference, I would set a strict method for determining wager amounts (it was at my own disgression this time), tinker with the betting edge threshold, and run the experiment more consistently across a season. A good idea for a related project would be to collect data on historical strikeout data and test how well the poisson distribution fits on strikeouts.

P.S. This kind of reeks of p-hacking, I must say. However, take it as an exploratory analysis to give us a better sense of how the betting outcomes behaves and how all the different factors interact. It's a starting point to design a more well thought-out experiment.